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ABSTRACT 

This reflection paper challenges current test scoring practices on the grounds that most wrong-answer selections are 
thoughtful not random, presenting research supporting this proposition. An alternative test scoring system is presented, 
described and its outcomes discussed. This new scoring system increases the number of variables considered, reduces the 
mesh of the analytical screen and provided considerable more information to inform teaching. 
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1. INTRODUCTION 

This paper takes a critical look at current multiple-choice test-scoring practices. Consider Dr. Friedrich A. 
von Hayek’s (1974) Nobel Prize acceptance lecture “The Pretense of Knowledge.” 

There is much reason to be apprehensive about the long run dangers created in a much wider field [than 
economics] by the uncritical acceptance of assertions which have the appearance of being scientific [because] 
what looks superficially to be like the most scientific procedures are often the most unscientific ... Beyond 
this, there are definite limits to what science can achieve [arising] from our inability to quantify some 
important variables. 

This paper quantifies these intractable variables in a novel manner. 


2. THE “RIGHT- WRONG” SCORING PERSPECTIVE (LEARNING 
HYPOTHESIS I) 

We currently presume that multiple-choice test scores estimate what students know as a cumulative 
proportion of an epistemological domain. The items are a representative sampling of course content. The 
frequencies of "right” answers are assumed proportional to the examinees’ “knowledge.” Options are often 
designed to reflect common procedural errors or common misconceptions ( e.g . concept inventories; Halloun, 
I. & Hestenes, D. L. (1985)). 

Most “wrong” answers are randomly selected and unrelated to the “right” answers (making scoring into a 
dichotomous process.) The frequency of "right” answers is necessary and sufficient information base about 
examinees’ subject-matter knowledge. 

An Editorial in Science (Coffey & Alberts, 2013) suggests that: 

“The [Common Core] contains a vast number of core disciplinary ideas and subideas. 

Current measurements and approaches do not allow these performances to be assessed 
easily...” 

In this approach we can expect a number of supporting observations. The most important of these are: 

1. The distribution of scores will resemble a typical normal bivariate distribution (Frame 1). 

2. The changes of answer selections from one administration to the next (whether the same test or a 
parallel one) will be from any “wrong” answer to the “right” one (Frame 2). 
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3. That changes of answer and direction of learning increase will progress in equivalent directions 
(the equivalency assumption'. Frame 3). 

Current practice presumes there can be only one reasonable “right” answer for a question. Learning 
requires the exchange of “wrong” answers for “right” ones (Frame 1). Answering is a dichotomy whose 
distribution of total frequencies resembles the upper limit of the binomial expansion (a + b f as n— *». This is 
the normal bivariate distribution (Frame 2). The only meaningful changes are those in the total-correct 
scores. (Frame 3). 


3. “MOST ANSWER SELECTIONS ARE THOUGHTFUL” (LEARNING 
HYPOTHESIS II) 

Education suffers non-measurability as much as any other discipline. We can compare live data with the 
hypothetical patterns postulated in FL. Which theory stands the test of oservation. If the results from live data 
show patterns different from the expectations for Hi, it must be rejected in favor of H n ? 


4. WHAT DO MULTIPLE-CHOICE TEST ACTUALLY MEASURE? 

Extending over more than half a century, interviewing students about their “wrong” non-random answer 
selection distributions (Frame 4), using thousands of students, a broad age range and more than one continent 
and more than one subject matter area showed item interpretation to be involved with the following 
observations: 

1. Teaching interpretation skills causes huge increases in students’ learning motivation (Powell, 
2010a). 

2. Using written selection explanations with adults, Powell (1968), showed explanations often 
predicted selections, 

3. Testing 550 students (with Gorham, 1956) from the third through the eighth grade, showed (Frame 
4; Powell, 1977) “wrong” answer subsets systematically ordered perfectly with chronological age 
(CA; Frame 5). More than half these correlations are conjoint (double -headed arrows; Luce & 
Tukey, 1964). 

4. Giving this test twice (October-March) to 2,000+ students (third grade through the end of high 
school) supported this developmental sequence. We defined students who followed the observed 
developmental sequence as showing increasing cognitive maturity (CM). This study showed four 
developmental pathways, refuting the dichotomous data assumption and displaying the dynamics of 
learning (Frame 6): 

a. A Piaget-like (1953) normal development (about 40% of the students), 

b. Declining CM (about 30% of the students), 

c. Students stalled at the literal (Concrete Operations) thinking level at a preadolescent level 
interpretation of questions (about 20% of the students) and 

d. A mature age multidimensional expansion of thinking skills beyond 2-value logic. Shown 
by shifting from a “right” answer to a “wrong” one (the remaining 10% of the students). 

5. Using a representative sample (52) of these about 3,000 students, we drew developmental profiles of 
their CM changes (Frame 7), which shows a student whose CM declined while total -correct score 
increased and a student with the opposite pattern of change. These two contrary-to-expectations 
results are sufficient to refute Hi. Powell and Powell showed that the 10% can be increased to at 
least 40%. 

6. Using 16,000+ students from years 4, 6 and 8 in India and two tests, one in science and one in 
mathematics, Powell, Bernauer and Agnihorti (2012) showed that item interpretation was the key 
ingredient for answer selection with different tests. It showed developmental sequences, procedural, 
information style and cultural bases for answer selection of “wrong” answers (No Frame). 

7. Using the multidimensional transition to identify high order changes among students Powell and 
Powell (2012) showed that when teaching interpretation skills replaces transmitting information a 
six-fold increase in multidimensional thinking among college undergraduates occurs (No frame). 
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8. From a representative sample of the data from study 4, Powell, (2013) showed (Frame 8), the 
changes in cognitive maturity (CM) are unrelated to changes in total correct score and that the latter 
score misclassified more than two thirds of the students from the corresponding results drawn from 
the same test. (Compare this result from live data with the anticipated result from random “wrong” 
answering given in Frame 3). Current scoring practice is invalid! 


5 . CONCLUSIONS AND IMPLICATIONS 

When including all answer-selection variables, multiple-choice tests measure interpretation skills 
independently of knowledge of the subject matter content of the questions except that fluency with the 
technical language of the item is partially involved. These skills are not dichotomous. Instead they can be 
expanded into logics of levels higher than 2 and this shift is observable using the Thurs (). Adding the 
“wrong” answer variables to the mix makes these data multinomial, not binomial, invalidating current 
test-scoring practice at the expense of two-thirds to three-fourths of the data available from each student from 
a single administration of any test and more than this when the tests are administered more than once to 
expose the learning dynamics. When “wrong” answers are omitted from the scoring process, this loss of data 
destroys the integrity of the matrix. 


6. THE FIX 

The algorithm for calculating the Thurs () [/>] (Frame 9), that bypasses linear dependency and detects non- 
linear data structure, is now in public domain for private use. The interpretation strategies found from 
interviews data from the 1977 study and the data samples from the 1992 and the 2013 studies may be 
requested from the author. The author will consult with any interested party on application of the p or 
educational effective improvement strategies. 
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